Goto

Collaborating Authors

 Medina Province


ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC

Alrehili, Ahlam, Alhothali, Areej

arXiv.org Artificial Intelligence

Grammatical Error Correction (GEC) is an important aspect of natural language processing. Arabic has a complicated morphological and syntactic structure, posing a greater challenge than other languages. Even though modern neural models have improved greatly in recent years, the majority of previous attempts used individual models without taking into account the potential benefits of combining different systems. In this paper, we present one of the first multi-system approaches for correcting grammatical errors in Arabic, the Arab Enhanced Edit Selection System Complication (ArbESC+). Several models are used to collect correction proposals, which are represented as numerical features in the framework. A classifier determines and implements the appropriate corrections based on these features. In order to improve output quality, the framework uses support techniques to filter overlapping corrections and estimate decision reliability. A combination of AraT5, ByT5, mT5, AraBART, AraBART+Morph+GEC, and Text editing systems gave better results than a single model alone, with F0.5 at 82.63% on QALB-14 test data, 84.64% on QALB-15 L1 data, and 65.55% on QALB-15 L2 data. As one of the most significant contributions of this work, it's the first Arab attempt to integrate linguistic error correction. Improving existing models provides a practical step towards developing advanced tools that will benefit users and researchers of Arabic text processing.


Comprehensive Analysis of VQC for Financial Fraud Detection: A Comparative Study of Quantum Encoding Techniques and Architectural Optimizations

Abbou, Fouad Mohammed, Bouhadda, Mohamed, Bouanane, Lamiae, Kettani, Mouna, Abdi, Farid, Abid, Abdelouahab

arXiv.org Artificial Intelligence

This paper presents a systematic comparative analysis of Variational Quantum Classifier (VQC) configurations for financial fraud detection, encompassing three distinct quantum encoding techniques and comprehensive architectural variations. Through empirical evaluation across multiple entanglement patterns, circuit depths, and optimization strategies,quantum advantages in fraud classification accuracy are demonstrated, achieving up to 94.3 % accuracy with ZZ encoding schemes. The analysis reveals significant performance variations across entanglement topologies, with circular entanglement consistently outperforming linear (90.7) %) and full connectivity (92.0 %) patterns, achieving optimal performance at 93.3 % accuracy. The study introduces novel visualization methodologies for quantum circuit analysis and provides actionable deployment recommendations for practical quantum machine learning implementations. Notably, systematic entanglement pattern analysis shows that circular connectivity provides superior balance between expressivity and trainability while maintaining computational efficiency. These researches offer initial benchmarks for quantum enhanced fraud detection systems and propose potential benefits of quantum machine learning in financial security applications.


Social-Sensor Identity Cloning Detection Using Weakly Supervised Deep Forest and Cryptographic Authentication

Alharbi, Ahmed, Dong, Hai, Yi, Xun

arXiv.org Artificial Intelligence

Recent years have witnessed a rising trend in social-sensor cloud identity cloning incidents. However, existing approaches suffer from unsatisfactory performance, a lack of solutions for detecting duplicated accounts, and a lack of large-scale evaluations on real-world datasets. We introduce a novel method for detecting identity cloning in social-sensor cloud service providers. Our proposed technique consists of two primary components: 1) a similar identity detection method and 2) a cryptography-based authentication protocol. Initially, we developed a weakly supervised deep forest model to identify similar identities using non-privacy-sensitive user profile features provided by the service. Subsequently, we designed a cryptography-based authentication protocol to verify whether similar identities were generated by the same provider. Our extensive experiments on a large real-world dataset demonstrate the feasibility and superior performance of our technique compared to current state-of-the-art identity clone detection methods.


Machine Learning-Based Quantification of Vesicoureteral Reflux with Enhancing Accuracy and Efficiency

Alqaraleh, Muhyeeddin, Alzboon, Mowafaq Salem, Al-Batah, Mohammad Subhi, Aesa, Lana Yasin Al, Abu-Arqoub, Mohammed Hasan, Marie, Rashiq Rafiq, Alsmad, Firas Hussein

arXiv.org Artificial Intelligence

Vesicoureteral reflux (VUR) is traditionally assessed using subjective grading systems, which introduces variability in diagnosis. This study investigates the use of machine learning to improve diagnostic consistency by analyzing voiding cystourethrogram (VCUG) images. A total of 113 VCUG images were reviewed, with expert grading of VUR severity. Nine image-based features were selected to train six predictive models: Logistic Regression, Decision Tree, Gradient Boosting, Neural Network, and Stochastic Gradient Descent. The models were evaluated using leave-one-out cross-validation. Analysis identified deformation patterns in the renal calyces as key indicators of high-grade VUR. All models achieved accurate classifications with no false positives or negatives. High sensitivity to subtle image patterns characteristic of different VUR grades was confirmed by substantial Area Under the Curve (AUC) values. The results suggest that machine learning can offer an objective and standardized alternative to current subjective VUR assessments. These findings highlight renal calyceal deformation as a strong predictor of severe cases. Future research should aim to expand the dataset, refine imaging features, and improve model generalizability for broader clinical use.


Classifying Dental Care Providers Through Machine Learning with Features Ranking

Al-Batah, Mohammad Subhi, Alzboon, Mowafaq Salem, Alqaraleh, Muhyeeddin, Abu-Arqoub, Mohammed Hasan, Marie, Rashiq Rafiq

arXiv.org Artificial Intelligence

This study investigates the application of machine learning (ML) models for classifying dental providers into two categories - standard rendering providers and safety net clinic (SNC) providers - using a 2018 dataset of 24,300 instances with 20 features. The dataset, characterized by high missing values (38.1%), includes service counts (preventive, treatment, exams), delivery systems (FFS, managed care), and beneficiary demographics. Feature ranking methods such as information gain, Gini index, and ANOVA were employed to identify critical predictors, revealing treatment-related metrics (TXMT_USER_CNT, TXMT_SVC_CNT) as top-ranked features. Twelve ML models, including k-Nearest Neighbors (kNN), Decision Trees, Support Vector Machines (SVM), Stochastic Gradient Descent (SGD), Random Forest, Neural Networks, and Gradient Boosting, were evaluated using 10-fold cross-validation. Classification accuracy was tested across incremental feature subsets derived from rankings. The Neural Network achieved the highest accuracy (94.1%) using all 20 features, followed by Gradient Boosting (93.2%) and Random Forest (93.0%). Models showed improved performance as more features were incorporated, with SGD and ensemble methods demonstrating robustness to missing data. Feature ranking highlighted the dominance of treatment service counts and annotation codes in distinguishing provider types, while demographic variables (AGE_GROUP, CALENDAR_YEAR) had minimal impact. The study underscores the importance of feature selection in enhancing model efficiency and accuracy, particularly in imbalanced healthcare datasets. These findings advocate for integrating feature-ranking techniques with advanced ML algorithms to optimize dental provider classification, enabling targeted resource allocation for underserved populations.


SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

Ayash, Lama, Alhuzali, Hassan, Alasmari, Ashwag, Aloufi, Sultan

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing; however, they often struggle to accurately capture and reflect cultural nuances. This research addresses this challenge by focusing on Saudi Arabia, a country characterized by diverse dialects and rich cultural traditions. We introduce SaudiCulture, a novel benchmark designed to evaluate the cultural competence of LLMs within the distinct geographical and cultural contexts of Saudi Arabia. SaudiCulture is a comprehensive dataset of questions covering five major geographical regions, such as West, East, South, North, and Center, along with general questions applicable across all regions. The dataset encompasses a broad spectrum of cultural domains, including food, clothing, entertainment, celebrations, and crafts. To ensure a rigorous evaluation, SaudiCulture includes questions of varying complexity, such as open-ended, single-choice, and multiple-choice formats, with some requiring multiple correct answers. Additionally, the dataset distinguishes between common cultural knowledge and specialized regional aspects. We conduct extensive evaluations on five LLMs, such as GPT-4, Llama 3.3, FANAR, Jais, and AceGPT, analyzing their performance across different question types and cultural contexts. Our findings reveal that all models experience significant performance declines when faced with highly specialized or region-specific questions, particularly those requiring multiple correct responses. Additionally, certain cultural categories are more easily identifiable than others, further highlighting inconsistencies in LLMs cultural understanding. These results emphasize the importance of incorporating region-specific knowledge into LLMs training to enhance their cultural competence.


Towards the Development of Balanced Synthetic Data for Correcting Grammatical Errors in Arabic: An Approach Based on Error Tagging Model and Synthetic Data Generating Model

Alrehili, Ahlam, Alhothali, Areej

arXiv.org Artificial Intelligence

Synthetic data generation is widely recognized as a way to enhance the quality of neural grammatical error correction (GEC) systems. However, current approaches often lack diversity or are too simplistic to generate the wide range of grammatical errors made by humans, especially for low-resource languages such as Arabic. In this paper, we will develop the error tagging model and the synthetic data generation model to create a large synthetic dataset in Arabic for grammatical error correction. In the error tagging model, the correct sentence is categorized into multiple error types by using the DeBERTav3 model. Arabic Error Type Annotation tool (ARETA) is used to guide multi-label classification tasks in an error tagging model in which each sentence is classified into 26 error tags. The synthetic data generation model is a back-translation-based model that generates incorrect sentences by appending error tags before the correct sentence that was generated from the error tagging model using the ARAT5 model. In the QALB-14 and QALB-15 Test sets, the error tagging model achieved 94.42% F1, which is state-of-the-art in identifying error tags in clean sentences. As a result of our syntactic data training in grammatical error correction, we achieved a new state-of-the-art result of F1-Score: 79.36% in the QALB-14 Test set. We generate 30,219,310 synthetic sentence pairs by using a synthetic data generation model.


Cloned Identity Detection in Social-Sensor Clouds based on Incomplete Profiles

Alharbi, Ahmed, Dong, Hai, Yi, Xun, Abeysekara, Prabath

arXiv.org Artificial Intelligence

We propose a novel approach to effectively detect cloned identities of social-sensor cloud service providers (i.e. social media users) in the face of incomplete non-privacy-sensitive profile data. Named ICD-IPD, the proposed approach first extracts account pairs with similar usernames or screen names from a given set of user accounts collected from a social media. It then learns a multi-view representation associated with a given account and extracts two categories of features for every single account. These two categories of features include profile and Weighted Generalised Canonical Correlation Analysis (WGCCA)-based features that may potentially contain missing values. To counter the impact of such missing values, a missing value imputer will next impute the missing values of the aforementioned profile and WGCCA-based features. After that, the proposed approach further extracts two categories of augmented features for each account pair identified previously, namely, 1) similarity and 2) differences-based features. Finally, these features are concatenated and fed into a Light Gradient Boosting Machine classifier to detect identity cloning. We evaluated and compared the proposed approach against the existing state-of-the-art identity cloning approaches and other machine or deep learning models atop a real-world dataset. The experimental results show that the proposed approach outperforms the state-of-the-art approaches and models in terms of Precision, Recall and F1-score.


Movement Control of Smart Mosque's Domes using CSRNet and Fuzzy Logic Techniques

Blasi, Anas H., Lababede, Mohammad Awis Al, Alsuwaiket, Mohammed A.

arXiv.org Artificial Intelligence

Mosques are worship places of Allah and must be preserved clean, immaculate, provide all the comforts of the worshippers in them. The prophet's mosque in Medina/ Saudi Arabia is one of the most important mosques for Muslims. It occupies second place after the sacred mosque in Mecca/ Saudi Arabia, which is in constant overcrowding by all Muslims to visit the prophet Mohammad's tomb. This paper aims to propose a smart dome model to preserve the fresh air and allow the sunlight to enter the mosque using artificial intelligence techniques. The proposed model controls domes movements based on the weather conditions and the overcrowding rates in the mosque. The data have been collected from two different resources, the first one from the database of Saudi Arabia weather's history, and the other from Shanghai Technology Database. Congested Scene Recognition Network (CSRNet) and Fuzzy techniques have applied using Python programming language to control the domes to be opened and closed for a specific time to renew the air inside the mosque. Also, this model consists of several parts that are connected for controlling the mechanism of opening/closing domes according to weather data and the situation of crowding in the mosque. Finally, the main goal of this paper has been achieved, and the proposed model has worked efficiently and specifies the exact duration time to keep the domes open automatically for a few minutes for each hour head.


An Effective Weight Initialization Method for Deep Learning: Application to Satellite Image Classification

Boulila, Wadii, Alshanqiti, Eman, Alzahem, Ayyub, Koubaa, Anis, Mlaiki, Nabil

arXiv.org Artificial Intelligence

The growing interest in satellite imagery has triggered the need for efficient mechanisms to extract valuable information from these vast data sources, providing deeper insights. Even though deep learning has shown significant progress in satellite image classification. Nevertheless, in the literature, only a few results can be found on weight initialization techniques. These techniques traditionally involve initializing the networks' weights before training on extensive datasets, distinct from fine-tuning the weights of pre-trained networks. In this study, a novel weight initialization method is proposed in the context of satellite image classification. The proposed weight initialization method is mathematically detailed during the forward and backward passes of the convolutional neural network (CNN) model. Extensive experiments are carried out using six real-world datasets. Comparative analyses with existing weight initialization techniques made on various well-known CNN models reveal that the proposed weight initialization technique outperforms the previous competitive techniques in classification accuracy. The complete code of the proposed technique, along with the obtained results, is available at https://github.com/WadiiBoulila/Weight-Initialization